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ABS labour statistics are drawn from four key types of data sources, or “pillars” of data, 


which provide complementary insights into the labour market. These are: 


e household surveys - individual households answer labour market questions about their 
individual, family or household circumstances (e.g. the monthly Labour Force Survey) 


e business surveys - collect a broad range of information from businesses about jobs and 


employees (e.g. the Survey of Employee Earnings and Hours, Job Vacancies Survey) 


e administrative data - information maintained by governments (such as taxation data) 
and other entities made available to the ABS for statistical purposes (e.g. as published in 
Weekly Payroll Jobs) 


e accounts compilation - bringing together data from separate administrative, business, 
and household sources to produce an Australian Labour Account 


Australian Labour Market 


Household Business Administrative Labour 
Survey Survey Data Accounts 


Sample surveys versus censuses 


The ABS uses both sample surveys and censuses to collect information from a population 
about characteristics of interest. In the field of labour statistics, the ABS uses sample 
surveys of households and businesses, as well as censuses (such as the Industrial Disputes 


collection). 


Censuses involve the collection of information from all units in the target population, while 
sample surveys involve the collection of information from only a part (sample) of the target 


population. 


Sample surveys have both advantages and disadvantages when compared with censuses. 
Some advantages are reduced costs (as less time is needed to collect, process and produce 
data), possible reductions in non-sampling error (this concept is discussed in further detail 
later in this chapter), improved timeliness, and the potential to gather more detailed 


information from each respondent. 


A disadvantage of sample surveys is that estimates are subject to sampling error, which 
occurs because data were obtained from only a sample rather than the entire population 
(this concept is discussed in further detail later in this chapter). Also, as a result of obtaining 
only a small number of observations in particular geographical areas and sub-populations, 


detailed cross-tabulations may be subject to high levels of error and be of limited use. 


Censuses are generally used when broad level information is sought for many fine sub- 
groups of the population, whereas sample surveys are used to collect detailed information 


to estimate for broader levels of the population. 


Sample design and sampling techniques 


ABS labour-related household and business sample surveys use probability sampling 
techniques, drawing their samples from a population frame. This section briefly defines and 
explains key concepts and terms related to survey design. See the household and business 
surveys sections for more detail on aspects of survey design that are particular to these 


types of surveys. 
Population 


A survey is concerned with two types of population: the target population, and the survey 
population. The target population is the group of units about which information is sought, 
and is also Known as the scope of the survey. It is the population at which the survey is 
aimed. The scope should state clearly the units from which data are required and the extent 


and time covered, e.g. households (units) in Australia (extent) in August 2020 (time). 


However, the target population is a theoretical population, as there are usually a number of 
units in the target population which cannot be surveyed. These include units which are 
difficult to contact and units which are missing from the frame. The survey population is 
that part of the population that is able to be surveyed, and is also called the coverage 


population. 
Statistical units 


Statistical units are used in the design, collection, analysis and dissemination of statistical 
data. There are several types of units, including: sampling units (the units selected in the 
sample survey), collection units (the units from which data are collected), reporting units 
(the units about which data are collected), and analysis units (the units used for analysis of 
the data). The units used in a survey may change at various stages in the survey cycle. For 
example, the Labour Force Survey uses a sample of households (sampling unit) from which 
information is collected from any responsible adult (collection unit) about each person in 
the household in scope of the survey (reporting units). The results of the survey may then 


be analysed for families (analysis unit). 
Frames 


The frame comprises a list of statistical units (e.g. persons, households or businesses) in the 
population, together with auxiliary information about each unit. It serves as a basis for 


selecting the sample. Two types of frames are used in ABS labour-related surveys: 


e List based frames - List based frames comprise a list of all sampling units in the survey 
population. List based frames are commonly used in surveys of businesses. ABS 


business surveys currently draw their list frames from the ABS Business Register. 


e Area based frames - Area based frames comprise a list of non-overlapping geographic 
areas. These areas may be defined by geographical features such as rivers and streets. 
They are usually used in household surveys. Once an area is selected, a list is made of 
the households in the area, and a sample of households selected from the list. Examples 
of geographic areas that may be used to create area frames include: local government 
areas; census collection districts; and postcodes. 


Auxiliary variables are characteristics of each unit for which information is known on the 
frame prior to the survey. Auxiliary variables can be used in the sample design to better 
target the population of interest, if the information on the frame is of sufficiently high 
quality and is correlated with the variables of interest in the survey. They can also be used in 
the estimation process in conjunction with the survey data: for example, industry of 


businesses. 


For most sampling methodologies, it is desirable to have a complete list from which to select 
a sample. However, in practice it can be difficult to compile such a complete list and 
therefore frame bias may be introduced. Frame bias occurs when an inappropriate frame is 
used or there are problems with the composition of the frame, with the result that the 
frame is not representative of the target population. Frames become inaccurate for many 
reasons. One of the most common problems is that populations change continuously, 
causing frames to become out of date. Frames may also be inaccurate if they are compiled 
from inaccurate sources. The following are some of the problems that can occur in the 


composition of frames. 


Under coverage occurs when some units in the target population that should appear on the 
frame do not. These units may have different characteristics from those units which appear 
on the frame, and therefore results from the survey will not be representative of the target 


population. 


Out of scope units are units that appear on the frame but are not elements of the target 
population. Selection of a number of out of scope units in the sample reduces the effective 
sample size, and increases sampling error. Furthermore, out of scope units appearing on 
the frame may be incorrectly accounted for in the estimation process, which may lead to 


bias in survey estimates. 


Duplicates are units that appear more than once on the frame. The occurrence of duplicates 
means that the probability of selection of the units on the frame is not as it should be for 
the respective sample design. In particular, the duplicate units will have more than the 
correct chance of selection, introducing bias towards the characteristics of these units. 


Duplicates also increase sampling error. 


Deaths are units that no longer exist in the population but are still on the frame. Deaths 


have the same impact on survey results as out of scope units. 


The quality of auxiliary variables can affect the survey estimates of the variables of interest, 


through both the survey design and the estimation process. 


The ABS attempts to minimise frame problems and uses standardised sample and frame 
maintenance procedures across collections. Some of the approaches taken are to adjust 
estimates using new business provisions, and to standardise across surveys the systems for 


handling estimation, imputation and outliers. 
Probability samples 


Probability samples are samples drawn from populations such that every unit in the 
population has a known, or calculable, non-zero probability of selection which can be 
obtained prior to selection. In order to calculate the probability of selection, a population 
frame must be available. The sample is then drawn from this frame. Alternatives to 


probability samples are samples formed without a frame, such as phone-in polls. 


Probability sampling is the preferred ABS method of conducting major surveys, especially 
when a population frame is available. Probability samples allow estimates of the accuracy of 
the survey estimates to be calculated. They are also used in ABS surveys as a means of 
avoiding bias in survey results. Bias is avoided when either the probability of selection is 
equal for all units in the target population or, where this is not the case, the effect of non- 


equal probabilities is allowed for in estimation. 
Stratified sampling 


Stratified sampling is a technique which uses auxiliary information available for every unit 
on the frame to increase the efficiency of a sample design. Stratified sampling involves the 
division (stratification) of the population frame into non-overlapping, homogeneous (similar) 
groups called strata, which can be treated as totally separate populations. A sample is then 
selected independently from each of these groups, and can therefore be selected in 
different ways for different strata, e.g. some strata may be sampled using ‘simple random 
sampling’ while others may be ‘completely enumerated’. These terms are explained below. 
Stratification variables may be geographical (e.g. State, capital city/balance of State) or non- 


geographical (e.g. number of employees, industry, turnover). 


All surveys conducted by the ABS use stratification. Household surveys use mainly 
geographic strata. Business surveys typically use strata which are related to the economic 
activity undertaken by the business, for example industry and size of the business (the latter 


based on employment size). 


Completely enumerated strata 


Completely enumerated strata are strata in which information is obtained from all units. 
Strata that are completely enumerated tend to be those where: each population unit within 
the stratum is likely to contribute significantly to the estimate being produced (such as 
strata containing large employers where the estimate being produced is employment); or 


there is significant variability across the population units within the stratum. 
Simple random sampling 


Simple random sampling is a probability sampling scheme in which each possible sample of 
the required size has the same chance of selection. It follows that each unit of the 


population has an equal chance of selection. 


Simple random sampling can involve units being selected either with or without 
replacement. Replacement sampling allows the units to be selected multiple times, whereas 
without replacement sampling allows a unit to be selected only once. In general, simple 
random sampling without replacement produces more accurate results as it does not allow 
sample to be 'wasted' on duplicate selections. All ABS surveys that use simple random 
sampling use the 'without replacement’ variant. Simple random sampling without 


replacement is used in most ABS business surveys. 
Systematic sampling 


Systematic sampling is used in most ABS household surveys, and provides a simple method 
of selecting the sample. It involves choosing a random starting point within the frame and 


then applying a fixed interval (referred to as the 'skip') to select members from a frame. 


Information on auxiliary variables can be used in systematic sampling to improve the 
efficiency of the sample. The units in the frame can be ordered with respect to auxiliary 
variables prior to calculating the skip interval and starting point. This approach ensures that 
the sample is spread throughout the range of units on the frame, ensuring a more 


representative sample with respect to the auxiliary variable. 


Systematic sampling with ordering by auxiliary variables is only useful if the frame contains 
auxiliary variables about each of the units in the population, and if these variables are 
related to the variables of interest. The relationship between the variables of interest and 
the auxiliary variables is often not uniform across strata. Consequently, it is possible to 


design a sample survey with only some of the strata making use of auxiliary variables. 
Probability proportional to size sampling 


Probability proportional to size sampling is a selection scheme in which units in the 


population do not all have the same chance of selection. With this method, the larger the 


unit with respect to some measure of size, the greater the probability that unit will be 
selected in the sample. Probability proportional to size sampling will lead to unbiased 


estimates, provided the different probabilities of selection are accounted for in estimation. 
Cluster sampling 


Cluster sampling involves the units in the population being grouped into convenient 
clusters, usually occurring naturally. These clusters are non-overlapping, well-defined 
groups which usually represent geographical areas. The sample is selected by selecting a 
number of clusters, rather than directly selecting units. All units in a selected cluster are 


included in the sample. 
Multi-stage sampling 


Multi-stage sampling is an extension of cluster sampling. It involves selecting a sample of 
clusters (first-stage sample), and then selecting a sample of population units within each 
selected cluster (second-stage sample). The sampling unit changes at each stage of 
selection. Any number of stages can be employed. The sampling units for any given stage of 
selection each form clusters of the next-stage sampling units. Units selected in the final 
stage of sampling are called final-stage units (or ultimate sampling units). The Survey of 
Employee Earnings and Hours uses multi-stage sampling - businesses (the first-stage units) 
selected in the survey are asked to select a sample of 'employees' (the final-stage units) 


using employee payrolls. Household surveys also use multi-stage sampling. 
Multi-phase sampling 


Multi-pbhase sampling involves collecting basic information from a sample of population 
units, then taking a sub-sample of these units (the second-phase sample) to collect more 
detailed information. The second-phase sample is selected using the information collected 
in the first phase, and allows the second-phase sample to be targeted to the specific 
population of interest. Population totals for auxiliary variables, and values from the first- 
phase sample, are used to weight the second-phase sample for the estimation of population 


totals. 


Multi-phase sampling aims to reduce sample size and the respondent burden and collection 
costs, while ensuring that a representative sample is still selected from the population of 
interest. It is often used when the population of interest is small and difficult to isolate in 
advance, or when detailed information is required. Multi-phase sampling is also useful when 
auxiliary information is not known for all of the frame units, as it enables the collection of 


data for auxiliary variables in the first-phase sample. 


The first-phase sample is designed to be large to ensure sufficient coverage of the 


population of interest, but only basic information is collected. The basic information is then 


used to identify those first-phase sample units which are part of the population of interest. 
A sample of these units is then selected for the second-phase sample. Therefore, the 
sampling unit remains the same for each phase of selection. If multi-phase sampling was 
not used, detailed information would need to be collected from all first-phase sample units 
to ensure reasonable survey estimates. In this way, multi-phase sampling reduces the 


overall respondent burden. 


Weighting and estimation 


Sample survey data only relate to the units in the sample. Therefore, the sample estimates 
need to be inflated to represent the whole population of interest. Estimation is the means 


by which this inflation occurs. 


The following section outlines various methods of calculating the population estimates from 
the sample survey data. It then describes various editing procedures used in labour-related 


statistics to improve the population estimates. 


Estimation is essentially the application of weights to the individual survey, and summing 
these weighted records to estimate totals. The value of these weights is determined with 


respect to one or more of the following three factors: 


e the probability of selection for each survey unit (probability weighting); 


¢ adjustment for non-response to correct for imbalances in the characteristics of 
responding sample units (post-stratification); and 


e adjustments to agree with known population totals for auxiliary variables - to correct for 
further imbalances in the characteristics of the selected sampled units (post- 
stratification, ratio estimation, calibration). 


Weights are determined using formulae (estimators) of varying complexity. 
Number-raised estimation 


Number-raised weights are given by Nh/nh (where Nh is the total number of units in the 
population for the stratum, and nh is the number of responding units in the sample for that 
stratum). The weight assigned to each survey unit indicates the number of units in the 
target population that the survey unit is meant to represent. For example, a survey unit with 
a weight of 100 represents 100 units in the population. Each survey unit in a stratum is given 
the same weight. Number-raised weights can only be used to weight simple random 


samples. 


Advantages of number-raised estimation are: it does not require auxiliary data; it is 


unbiased; and the accuracy of the estimates can be calculated relatively simply. However, 


number-raised estimation is not as accurate as some other methods with the same overall 


sample size. 
Ratio estimation 


Ratio estimation involves the use of known population totals for auxiliary variables to 
improve the weighting from sample values to population estimates. It operates by 
comparing the survey sample estimate for an auxiliary variable with the known population 
total for the same variable on the frame. The ratio of the sample estimate of the auxiliary 
variable to its population total on the frame is used to adjust the sample estimate for the 


variable of interest. 


The ratio weights are given by X/x (where X is the known population total for the auxiliary 
variable, and x is the corresponding estimate of the total based on all responding units in 
the sample). These weights assume that the population total for the variable of interest will 
be estimated by the sample equally as well (or poorly) as the population total for the 


auxiliary variable is estimated by the sample. 


Ratio estimation can be more accurate than number-raised estimation if the auxiliary 
variable is highly correlated with the variable of interest. However, it is subject to bias, with 
the bias increasing for smaller sample sizes and where there is lower correlation between 


the auxiliary variable and the variable of interest. 
Post-stratification 


Post-stratification estimation also involves the use of auxiliary information to improve the 
weighting from sample values to population estimates. Subgroups of the survey sample 
units are formed based on auxiliary variables after the survey data have been collected. 
Estimates of subgroup population sizes (based on probability weighting) are compared with 
known subgroup population sizes from independent sources. The ratio of the two 
population sizes for each subgroup is used to adjust the original estimate for the variable of 


interest (based on probability sampling). 


Post-stratification is used to refine the estimation weighting process by correcting for 
sample imbalance and, assuming that the survey respondents are representative of missing 
units, correcting for non-response. For example, in the LFS, the sample is post-stratified by 
age, sex, Capital city/rest of State, and State/Territory of usual residence. Estimates of the 
number of persons in these subgroups based on Census/Estimated Resident Population 
data are then compared to the estimates based on the survey sample to give the post- 


stratification weights. 


Calibration 


Calibration essentially uses all available auxiliary information to iteratively modify the 
original weights (based on number-raised weights). The new weights ensure that the sample 
estimates are consistent with known auxiliary information. Both post-stratification and ratio 
estimation can be used as part of the calibration weighting process. Calibration is useful if 
the survey sample estimates need to match the unit totals for a number of different 
subgroups, or for more than one auxiliary variable. It is mostly used in Special Social 
Surveys. For example, the Survey of Employment and Unemployment Patterns was 
weighted so that the survey estimates aligned with both population estimates based on 
Census data and estimates of the number of persons 'employed', 'unemployed' and 'not in 


the labour force' from the LFS. 


Editing 


Editing is the process of correcting data suspected of being wrong, in order to allow the 


production of reliable statistics. The aims of editing are: 


e to ensure that outputs from the collection are mutually consistent: for example, two 
different methods of deriving the same value should give the same answer; 


e to correct for any missing data; 
e to detect major errors, which could have a significant effect on the outputs; and 


¢ to find any unusual output values and their causes. 


The purpose of editing is to correct non-sampling errors, such as those introduced by 
misunderstanding of questions or instructions, interviewer bias, miscoding, non-availability 
of data, incorrect transcription, non-response, and non-contact. Non-response occurs when 
all (total non-response) or part (partial non-response) of a questionnaire is not completed by 


the respondent. High levels of non-response can cause bias in the sample based estimates. 


Editing is also used to identify outliers. The statistical term ‘outlier’ has several definitions, 
depending on the context in which it is used. Here it is used loosely to describe extreme 
values that are verified as being correct, but are very different from the values reported by 
similar units, and are expected to occur only very rarely in the population as a whole. In 
practice, an outlier is usually considered to be a unit that has a large effect on survey 
estimates of level, on estimates of movement, or on the sampling variance. This may occur 
because the unit is not similar to other units in the stratum - for example, if its’ true 
employment is much greater than the frame employment. It may also occur when an 


extreme value is recorded for some variable from an otherwise ordinary sampling unit. 


Certain types of non-response, and the presence of outliers in the sample, may be 


addressed using a variety of statistical techniques. 


Imputation involves supplying a value for a non-responding unit, or to replace ‘suspect’ 


data. Imputation methods fall into three groups: 


the imputed value may be derived from other information supplied by the respondent; 


the imputed value may be derived from information supplied by other similar 
respondents in the current survey; and 


the values supplied by the respondent in previous surveys may be modified to derive a 
value. 


The following imputation methods are used in labour-related surveys: 


Deductive imputation involves correcting a missing or erroneous value by using other 
information that reveals the correct answer. For example, a response of 18,000 has been 
given where respondents have been asked to reply in '$000s' and where the expected 
range of responses is 13-21. A quick examination of other parts of the form shows that 
$18,000 is very likely the amount actually spent by the respondent, so 18,000 is 
‘corrected’ to 18. 


Central-value imputation involves replacing a missing or erroneous item with a value 
considered to be 'typical' of the sample or sub-sample concerned. Live respondent mean 
is an example of central-value imputation. This technique involves calculating the 
average stratum value for the data item of interest across all responding live units in the 
stratum, and assigning this value to all live non-responding units in the stratum. 


Hot-deck imputation is similar to central-value imputation, but takes the absolute value 
from a donor unit: for example, earnings per hour for a given combination of occupation, 
location and industry in Characteristics of Employment. 


Cold-deck imputation involves using previous survey data to amend items which fail 
edits. It may involve copying data from the previous survey cycle to the current cycle. 
One specific example of this type of imputation is Beta imputation, which involves 
estimating missing values by applying an imputed growth rate to the most recently 
reported data for these units, provided that data have been reported in either of the two 
previous periods. 


When adjusting for outliers, a compromise is always necessary between the variability and 


bias associated with an estimate. There are two methods available for dealing with outliers. 


Historically the ABS has used the 'surprise outlier’ approach for most business surveys, but 


over time has gradually changed to using 'winsorization’. 


Surprise outlier approach - Generally, this technique is used to deal with a selected unit 
which is grossly extreme for a number of variables. The approach treats each outlier as if 
it were the only extreme unit in the stratum population. The outlier is given a weight of 
one, as if it had been selected in a CE stratum. As a result of the outlier's movement to 
the CE stratum, the weight for units in the outlier's selection stratum has to be 
recalculated, as the population and sample size have effectively been reduced by one. 
This has the effect that the other population units which would have been represented 


by the outlier are now represented by the average of the other units in the stratum. 
Therefore, the choice of treatments for a suspected outlier using the surprise outlier 
approach are either for it to represent all of the units it would normally represent, or to 
represent no units other than itself. It is preferable to set a maximum number of 
surprise outliers which can be identified in any one survey. 


Winsorization technique - This technique is a more flexible approach. Here a value is 
considered to be an outlier if it is greater than a predetermined cut off. The effect of the 
outlier on the estimates is reduced by modifying its reported value. On application of the 
winsorization formula, sample values greater than the cut off are replaced by the cut off 
plus a small additional amount. The additional amount is the difference between the 
sample value and the cut off, multiplied by the stratum sampling fraction. Thus 
winsorization has most impact in strata with low sampling fractions, and the impact 
decreases as sampling fractions increase. Effectively, winsorization results in the outlier 
only representing itself, with the remaining population units that would have been 
represented by the outlier being instead represented by the cut off. 


Time series estimates 


Time series are statistical records of various activities measured at regular intervals of time, 
over relatively long periods. Data collected in irregular surveys do not form time series. The 
following section outlines the various elements of time series, and describes the ABS 


method of calculating seasonally adjusted and trend estimates. 


ABS time series statistics are published in three forms: original, seasonally adjusted and 


trend. 


Original estimates are the actual estimates the ABS derives from the survey data or other 
non-survey sources. Original estimates are comprised of trend behaviour, systematic 


calendar related influences, and irregular influences. 


Systematic calendar related influences operate in a sustained and systematic manner that is 
calendar related. The two most common of these influences are seasonal influences and 


trading day influences. 
Seasonal influences occur for a variety of reasons: 


e They may simply be related to the seasons and related weather conditions, such as 
warmth in summer and cold in winter. Weather conditions that are out of character for a 
particular season, such as snow in summer, would appear as irregular, not seasonal, 
influences. 


e They may reflect traditional behaviour associated with various social events (e.g. 
Christmas and the associated holiday season). 


e They may reflect the effects of administrative procedures (e.g. quarterly provisional tax 


payments and end of financial year activity). 


Trading day influences refer to activity associated with the number and types of days ina 
particular month, as different days of the week often have different levels of activity. For 
instance, a calendar month typically comprises four weeks (28 days) plus an extra two or 
three days. If these extra days are associated with high activity, then activity for the month 


overall will tend to be higher. 


Seasonal and trading day factors are estimates of the effect that the main systematic 
calendar related influences have on ABS time series. These evolve to reflect changes in 
seasonal and trading patterns of activity over the life of the time series, and are used to 


remove the effect of seasonal and trading day influences from the original estimates. 


Seasonally adjusted estimates are derived by removing the systematic calendar related 
influences from the original estimates. Seasonally adjusted estimates capture trend 
behaviour, but still contain irregular influences that can mask the underlying month to 
month or quarter to quarter movement in a series. Seasonally adjusted estimates by 


themselves are only relevant for sub-annual collections. 


Irregular influences are short term fluctuations which are unpredictable, and hence are not 
systematic or calendar related. Examples of irregular influences are those caused by one-off 
effects such as major industrial disputes or abnormal weather patterns. Sampling and non- 
sampling errors that behave in an irregular or erratic fashion with no noticeable systematic 


pattern are also irregular influences. 


Trend estimates are derived by removing irregular influences from the seasonally adjusted 
estimates. As they do not include systematic, calendar related influences or irregular 
influences, trend estimates are the best measure of the underlying behaviour of the series, 


and the labour market. 


Trend estimates are produced by smoothing the seasonally adjusted series using a 
statistical procedure based on Henderson moving averages. At each survey cycle, the trend 
estimates are calculated using a centred x-term Henderson moving average of the 
seasonally adjusted series. The moving averages are centred on the point in time at which 
the trend is being estimated. The number of terms used to calculate the trend estimates 
varies across surveys. Generally, ABS monthly surveys use a 13-term Henderson moving 


average, and quarterly surveys use a 7-term Henderson moving average. 


Estimates for the most recent survey cycles cannot be directly calculated using the centred 
moving average method, as there are insufficient data to do so. Instead, alternative 
approaches that approximate the smoothing properties of the Henderson moving average 


are used - such as asymmetric averages. This can lead to revisions in the trend estimates for 


the most recent survey cycles, until sufficient data are available to calculate the trend using 
the centred Henderson moving average. Revisions of trend estimates will also occur with 


revisions to the original data and re-estimation of seasonal adjustment factors. 


Reliability of estimates 


The accuracy of an estimate refers to how close that estimate is to the true population 
value. Where there is a discrepancy between the value of the sample estimate and the true 
population value, the difference between the two is referred to as the 'error of the sampling 


estimate’. The total error of the survey estimate results from two types of error: 


e sampling error - errors which occur because data were obtained from only a sample 
rather than the entire population, and 


¢ non-sampling error - errors which occur at any stage of a survey, and can also occur in 
censuses. 


Sampling error 


Sampling error equals the difference between the estimate obtained from a particular 
sample, and the value that would be obtained if the whole survey population were 
enumerated. It is important to consider sampling error when publishing survey results as it 
gives an indication of the accuracy of the estimate, and therefore reflects the importance 
that can be placed on interpretations. For a given estimator and sample design, the 
expected size of the sampling error is affected by how similar the units in the target 


population are and the sample size. 
Variance 


Variance is a measure of sampling error that is defined as the average of the squares of the 
deviation of each possible estimate (based on all possible samples for the same design) 
from the expected value. It gives an indication of how accurate the survey estimate is likely 
to be, by measuring the spread of estimates around the expected value. For probability 
sampling, an estimate of the variance can be calculated from the data values in the 


particular sample that is generated. 


Methods used to calculate estimates of variance in ABS labour-related surveys are outlined 


below. 


e Jack-knife: This method starts by dividing the survey sample into a number of equally 
sized groups (replicate groups), containing one or more units. Pseudo-estimates of the 
population total are then calculated from the sample by excluding each replicate group 
in turn. The jack-knife variance is derived from the variation of the respective pseudo- 
estimates around the estimate based on the whole sample. This method is used in a 


number of household surveys, including the LFS (from November 2002), supplementary 
surveys (from August 2005), the Multipurpose Household Survey (MPHS) and some 
labour-related business surveys. 


Bootstrap: The Bootstrap is a variance estimation method which relies on the use of 
replicate samples, essentially sampling from within the main sample. Each of these 
replicate samples is then used to calculate a replicate estimate and the variation in these 
replicate estimates is used to calculate the variance of a particular estimate. 


Ultimate cluster variance: This method is used in some multi-stage sampling, and 
involves using the variation in estimates derived from the first-stage units to estimate the 
variance of the total estimate. This method is used in the Survey of Employee Earnings 
and Hours. 


Split halves: This method involves dividing the sample into half and, from each half, 
obtaining an independent estimate of the total. The variance estimate is produced using 
the square of the difference of these estimates. Variations of the split halves method for 
calculating variance estimates were used in a number of household surveys, including 
the LFS prior to November 2002 and supplementary surveys prior to August 2005. 


The variances indicated in ABS household survey publications are generally based on 


models of each survey's variance. The variances for a range of estimates are calculated 


using one of the above methods, and a curve is fitted to the results. This curve indicates the 


level of variance which could be expected for a particular size of estimate. 


Standard Error (SE) 


The most commonly used measure of sampling error is called the standard error (SE). The 


SE is equal to the square root of the variance. An estimate of the SE can be derived from 


either the population variance (if known) or the estimated variance from the sample units. 


Any estimate derived from a probability based sample survey has an SE associated with it 


(called the SE of the estimate). The main features of SEs are set out below. 


SEs indicate how close survey estimates are likely to be to the expected population 
values that would be obtained from a census conducted under the same procedures and 
processes; 


SEs provide measures of variation in estimates obtained from all possible samples under 
a given design; 


Small SEs indicate that variation in estimates from repeated samples is small, and it is 
likely that sample estimates will be close to the true population values, regardless of the 
sample selected; 


Estimates of SEs can be obtained from any probability sample - different random 
samples will produce different estimates of SEs; 


SEs calculated from survey samples are themselves estimates, and thus also subject to 
SEs; 


e When comparing survey estimates, statements should be made about the SEs of those 
estimates; and 


e SEs can be used to work out confidence intervals. This concept is explained below. 
Confidence Interval (Cl) 


A confidence interval (Cl) is defined as an interval, centred on the estimate, with a 
prescribed level of probability that it includes the true population value (if the estimator is 
unbiased), or the mean of the sampling distribution (if the estimator is biased). Estimates 


from ABS surveys are usually unbiased. 


Estimates are often presented in terms of a Cl. Most commonly, Cls are constructed for 
66%, 95%, and 99% levels of probability. The true value is said to have a given probability of 


lying within the constructed interval. For example: 


e 66% chance that the true value lies within 1 standard error of the estimate (2 chances in 
3); 

e 95% chance that the true value lies within 2 standard errors of the estimate (19 chances 
in 20); and 


e 99% chance that the true value lies within 3 standard errors of the estimate (99 chances 
in 100). 


Cls are constructed using the standard error associated with an estimate. For example, a 
95% Cl is equivalent to the survey estimate, plus or minus two times the standard error of 
the estimate. For example, the originally published LFS estimate of employment (seasonally 
adjusted) for September 2017 was 12,290,200 persons, and the estimate had a standard 
error of 44,400. The 95% Cl could be expressed: "we are 95% confident that the true value 
for employment lies between 12,201,400 and 12,379,000". 


Relative Standard Error (RSE) 


Another measure of sampling error is the relative standard error (RSE). This is the standard 
error expressed as a percentage of the estimate. Since the standard error of an estimate is 
generally related to the size of the estimate, it is not possible to deduce the accuracy of the 
estimate from the standard error without also referring to the size of the estimate. The 
relative standard error avoids the need to refer to the estimate, since the standard error is 
expressed as a proportion of the estimate. RSEs are useful when comparing the variability 


of population estimates of different sizes. They are commonly expressed as percentages. 


Very small estimates are subject to high RSEs, which detract from their usefulness. In some 
ABS labour-related statistical publications, estimates with an RSE greater than 25% but less 
than 50% have an asterisk (*) displayed beside the estimate, indicating they should be used 


with caution. Estimates with an RSE greater than 50% have two asterisks (**) displayed 


beside the estimate, indicating they are so unreliable as to detract seriously from their value 
for most reasonable uses. All cells in a Data Cube with RSEs greater than 25% contain a 
comment indicating the size of the RSE. These cells are identified by a red indicator in the 


corner of the cell. The comment appears when the mouse pointer hovers over the cell. 
Non-sampling error 


Non-sampling error refers to all other errors in the estimate. Non-sampling error can be 
caused by non-response, badly designed questionnaires, respondent bias, interviewer bias, 
collection bias, frame deficiencies and processing errors. It is often difficult and expensive to 


quantify non-sampling error. 


Non-sampling errors can occur at any stage of the process, and in both censuses and 
sample surveys. Non-sampling errors can be grouped into two main types: systematic and 
variable. Systematic error (called bias) makes survey results unrepresentative of the 
population value by systematically distorting the survey estimates. Variable error can distort 


the results on any given occasion, but tends to balance out on average over time. 


Every effort is made to minimise non-sampling error in ABS surveys at every stage of the 
survey, through careful design of collections, and the use of rigorous editing and quality 
control procedures in the compilation of data. Some of the approaches adopted are listed 


below. 


e Reducing frame deficiencies. 


e Reducing non-response - Non-response results in bias in the estimate because it is 
possible the non-respondents have different characteristics to respondents, leading to 
an under-representation of the characteristics of non-respondents in the sample survey 
estimate. The ABS pursues a policy of intensive follow up of non-respondents. This 
includes multiple visits or telephone calls in an attempt to contact respondents, and 
letters requesting compliance with the survey. Partial non-response is also followed up 
with respondents. 


Reducing instrument errors - These errors relate to poor questionnaire design, leading to 
questions which are not easily understood by respondents, and hence incorrect 
responses. This is particularly relevant for household surveys. The ABS ensures that all 
household survey questionnaires are carefully tested using cognitive testing and dress 
rehearsals of the survey before it is officially conducted. New business survey 
questionnaires and additional questions in business surveys are also rigorously tested 
before they are introduced. 


Measures of non-sampling error 


Non-sampling error is difficult to quantify; however, an indication of the level of non- 


sampling error can be determined from a number of quality measures. These include: 


e Response rates: The number of responding units in a survey expressed as a proportion 
of the total number of units selected (excluding deaths). Response rates can also be 
calculated for individual questions within a survey. 


e Imputation rates: The number of responses which need to be imputed expressed as a 
proportion of the total number of responses 


Coverage rates: An estimate of the proportion of units in the target population which are 
not covered by the frame 


Any Responsible Adult rates: The number of responding units in a survey for which 
information was supplied by a responsible adult rather than personally, expressed as a 
proportion of the total number of responding units. Any Responsible Adult rates can only 
be calculated for household surveys. 


Confidentiality 


All releases of data from the ABS are confidentialised to ensure that no unit (e.g. person or 
business) is able to be identified. The ABS applies a set of rules, concerning the minimum 
number of responses required to contribute to each data cell of a table, and the maximum 
proportion that any one respondent can contribute to a table cell, to ensure that 


information about specific units cannot be derived from published survey results. 


In some instances it is not possible to confidentialise responses from businesses that 
contribute substantially to a data cell. In this case, agreement is sought from the business 
for their data to still be published. If agreement is not reached, all affected data cells are 


suppressed. 


Under the Census and Statistics Act, 1905 it is an offence to release any information 
collected under the Act that is likely to enable identification of any particular individual or 
organisation. Introduced random error is used to ensure that no data are released which 


could risk the identification of individuals in the statistics. 


A technique, known as perturbation, has been developed to randomly adjust cell values. 
Random adjustment of the data is considered to be the most satisfactory technique for 
avoiding the release of identifiable data. When the technique is applied, all cells are slightly 
adjusted to prevent any identifiable data being exposed. These adjustments result in small 
introduced random errors. However, the information value of the table as a whole is not 


impaired. 


These adjustments may cause the sum of rows or columns to differ by small amounts from 
table totals. The counts are adjusted independently in a controlled manner, so the same 
information is adjusted by the same amount. However, tables at higher geographic levels 


may not be equal to the sum of the tables for the component geographic units. 


It is not possible to determine which individual figures have been affected by random error 
adjustments, but the small variance which may be associated with derived totals can, for the 


most part, be ignored. 


